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The presenl invention relates to production of various components, e.g. proteins, in plants 
according to a new method comprising targeting of any nuclear encoded component to the 

o 

chloroplast/plastid stroma through the ER (endoplasmic reticulum). The invention further 
comprises the recombinant constructs, e.g. vectors comprising the signalling system required 
and optionally sequences for glycosylation or olheT modifications of the component e.g in the 
ER or chloroplast/plastid. 

It is well established that targeting foreign components to plastids improves expression. 
Proteins will be used as the model component in the present application even if the invention 
can be utilized also with other components, e.g. peptides. Plastid targeted proteins accumulate 
up to 30-40% of total soluble protein of the leaf, as compared with 0.01-0.4% with cytosolic 
expressed proteins. It also alleviates cytosolic toxicity and other deleterious effects of the 
gene products. Tn addition, chloroplasls can process eukaryotic proteins (for instance folding, 
formation of disulfide bridges), which in many cases will eliminate the need for complicated 
and expensive in vitro processing of biopharmaceutical proteins in other recombinant systems 
(i.e., bacteria). All these reasons make the plastid targeting system the best for expressing 
foreign proteins in plants. 

With the method according to the present invention with the ER - plastid pathway, in addition 
to all the advantages pointed before; optionally glycosylated proteins can be produced and 
accumulated in the plastid. This is of great importance for production of certain biologically 
active compounds to be used in therapy and diagnosis with high requirements with regard to 
binding and other functional characteristics. Non-glycosylated antibodies expressed in plants 
according to prior art methods will not be suitable for its use in patients. 

The method according to the invention is suitable to be used, for instance to express/produce: 
o Vaccines (such as vaccines against for example hepatitis B virus envelope protein, 
human cytomegalovirus glycoprotein B, Norwalk virus capsid protein, etc, etc.) 
because some of them are glycosylated, 
o Antibodies or antibody fragments. 
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• Pharmaceutical proteins such as signal peptides, protein hormones, structural proteins 
such as collagen, blood proteins such as serum albumin, enzymes such as secreted 
alkaline phosphatase, etc. etc. 

• Industrial enzymes 

• Consumer cn2ymes, such as enzymes used in washing powder. 

• Enzymes that produce a secondary or new metabolite/chemical compound in the 
chloroplast. 

A DN A construct according to the invention is introduced into a plant transiently or in a 
stable manner using any technique for introduction of DNA into plant cells. Such known 
methods include, without limitation, Agrobacterium mediated transfer, particle bombardment, 
electroporation, chemically induced introduction, conjugation, crossings, protoplast fusions 
etc. After transformation into the plant (being any plant, or green algae) individual 
transformants are analysed and selected. Such plants could already contain other mutations or 
transgenic DNA fragments that could for example change the glycosylation pattern of 
proteins in the plant 

Arabidopsis CAH1 (U73462) has been found to be of special interest with regard to the 
method of the invention as well sequences in CAH1 for construction of recombinant vectors 
for production of various compounds. 

The protein sequence of CAH1 is: 

MKIMMM1KLCFFSMSL1C1APADAQ 

TEGVVFGYKCKNGPNQWGHLNPIU^^ 

NVAM1-K5EGA 

GO Vir^NKKYTLLQMH WHTPSF.HfTLI ICVQYAAELH MVI 1QAKDGSFA VVASLFKiGTEEPFLSQMICEK 
L VKLKJCCRLKG N HT AQVE VG RIDTRH ! ERKTRK Y YRY IGSLTTPPCSENVS WT 
n.GKVRSMSKEQVELLRSPLDTSFr^ 

wiili the corresponding DNA sequence: 
The open reading frame is underlined 

1 atgcagtaat ctgataaaac cctccacaga galitccaac aaaacaggaa ctaaaacaca 

61 an atgaaaat tatgatgatg attaagctct ecttcttctc cgtgjccctc atetgeattg 

121 cacctgcaga tgctcagaca gaaggagtag tgtttggaia taaaggcaaa aat&gaccaa 
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181 accaatgggg acacttaaac cctcacttca ccacatgcgc ggtcggtaaat tgcaatctc 
241 caatteatat tm a*fif>*<*p caaatatttt acaaccacaa at tpaattca atacaccetg 
301 aatactactt cacaaacpca acactagtga accacgtctg taatgttscc atettcttcg 
361 gggagggagc agftagatetg ataatacaaa acaagaacta taccttactg caaatgcatt 
421 ggcacactcc ttctgaacat cacctccatg paqtccaata tgcagctgag ctgcacatgg 
481 tacaccaagc aaaagaigga agctltgctg iggtggcaag tclcttcaaa atcggcactg 
541 aagagccttt cctctctcag atgaaggaga aattejatjrda gctaaaggaa gagagactca 
601 aagggaacca cacagcacaa gtggaagtag gaagaatcga cacaagacac attgaacgta 
661 agactegaaa gtactacaga tacattgglt cactcaclac tcctccttgc tccgagaacg 
721 tttcttggac catccttggc aagglgaggt caatvrtcaaa g^aacaagta gaactactca 
781 gatctccatt ggacacttct ttcaagaaca attcaagacc gtgtcaaccc ctcaacggcc 
841 ggagagttga galgUccac gaccacgagc gtgtcgataa aaaagaaacc ggtaacaaaa 
901 agaaaaaacc caat taaaat agtlttacat tgtctattgg tttgtttaga accctaatta 
961 gctttgtaaa actaataatc tcttatgtag tactgtgttg ttgntacga cttgatatac 
1021 gatttccaaa aaaaaaaaaa aaaaaa 

We have concluded that CAH1 has an N-terminal signal peptide that targets the protein to the 
ER. The results of our studies of the mechanism further suggests that the mature stroma 
(chloroplast/plastid stroma) CAH1 protein is N-glycosylated and also that this protein is not 
Ihe only glycosylated protein in Arabidopsi.s chloroplasts/plastids. Interestingly, the 
occurrence of potential N-glycosylation sites is not the only common feature among the 
glycosylated stroma proteins. Based on comparisons with other such proteins wc conclude 
that the C-terminal, which is highly hydrophilic and charged, including lysine residues, seems 
to be important for the import mechanism of these proteins into the plastiA 

A typical expression construct based on the sequences from the CAH1 system could contain 
the following parts: 

• A selectable marker to facilitate selection of the transgenes under an appropriate 
promoter. 

• A promoter to drive the expression of the gene of interest. This promoter could be 
chosen among known promoters or promoters optimised for use in plant systems, e.g. 
a constitutive promoter such as the CmV35S promoter (or variants of it) or it could be 
an inducible promoter such as an heat inducible promoter. For use of certain inducible 
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promoters an introduced transcription factor (natural or constructed) is required and 
such transcription factor could be under the control of different promoters. The 
promoter could also be tissue specific, such as seed specific, leaf specific etc, and or 
specifically expressed at different times, developmental, seasonally, diurnal, 
o A 5" un-translaied region can be added to the construct in order control the 

iranslational initiation efficiency and transcript stability. 
o The protein coding part of the construct starts with an ER signal peptide, e.g an 
approximately 24 amino acids (aa) sequence from the CAH1 gene described below 
followed by an chloroplast transit peptide like sequence (for example aa 25-75 of the 
CAM gene). After this an endoprotease, or other site can be added to facilitate the 
removal of any remaining signal sequences after processing. Then the sequence 
coding for the desired protein to be expressed is inserted. Additions and removals of 
glycosylalion sites can be performed in this sequence depending on the need for 
glycosylation of the final product. Then the same or a different endoprotease or similar 
site can be added before the c-terminal sequence consisting of the last 6laa of the 
CAH1 gene. 

o A y UTR and a terminator can be added which would facilitate transcript termination, 

poly adenylation and transcript stability, 
o After harvest the protein or the chemical compound produced have to be purified this 

is facilitated by the tact that one first can purify the chloroplasts and then the protein 

from the chloroplasts have to be purified. The protein could also be purified from for 

example a whole leaf. 

In a reduction to practice experiment transient expression of Arabidopsis CAH1 fused to the 
green fluorescent protein (GFP) was performed in Arabidopsis and tobacco cells. As 
expected, the expressed GFP protein (negative control) was distributed uniformly in the 
cytosol and in the nucleus whereas the plastid control (transit sequence of Rbcs fused to GFP) 
was targeted to the chloroplast. Transient expressions of the complete CAH1 protein were 
then performed. Plastid localization was obtained with the CAH1 protein when GFP was 
fused to its C- terminus. For further examination of the domain required for plastid 
localization of the CAlIi protein, we generated several versions of the CAH1 protein and 
transient expressions of corresponding GFP fusions were tested into Arabidopsis and BY2 
tobacco cells. The signal peptide of CAH1 directed GFP to the ER in Arabidopsis protoplasts. 
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Example of the construct where the GFP is the reporter gene that should be exchanged with 
any protein of interest), this construct will target the GFP to the plastids. 



CaMV3JS CAlli4aa GFP CA2 61&A nos? 

1 mkimmmikJc ffsmslicia padaqtegw fgykgkngpn qwghlnphft tcavgklqsp 

6 1 idiqrrqify nhklnsihre yyftnatlvn hvcnvamffg egagdviien knyt 

+ GFP (or the gene to be produced)* 
224 ilgkvrs mskeqvellr 

241 spldtsfknn srpcqplngr rvemfhdhcr vdkketgnkk kkpn 



Accordingly a recombinant construct of the invention comprises an ER signal sequence and a 
plastid signalling sequence. Examples of such sequences are found inCAIIl (sequence given 
above) and analogues. 

The predicted minimal ER signalling sequence has been found to be an N-ierminal amino acid 
sequence of about 24 amino acids. Examples of RR signalling sequences include, without 
limitation: 

MK1MMMIKLCFFSMSL1CIAPADA CAH1 Arabidopsis 

MAASHGNAIFVLTJ.CTLFLPSLAC CAH1 Rice 

MAARTGIFS VFV AVLLSISAFSSA Ribophorin I Arabidopsis 

In addition to the ER signal sequence a sequence necessary for localisation to the stroma in 
the chloroplast, i.e. to the next and final destination in the plant, a C-terminal sequence of the 
CAH1 protein or a functional analogue is required. At present we believe that such a minimal 
sequence from the CAH1 protein system comprises a 12-15 amino acid sequence or a 
functional analogue. This sequence could probably also be located directly after the ER 
signal, even if it according to the at present preferred embodiment is located downstream the 
desired protein . Examples of such sequences include, without limitation the following 
sequences ot functional dcrivates thereof. Such functional dcrivates are characterised by 
comprising 3-4 lysines in a row. 
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KKETGNKKKKPN CAH1 Arabidopsis 

RFWGKKKRRS S P CAHl Rice 

TGKKKKKTYLP Ribophorin 1 Arabidopsis 
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